Skip to content

perf(conversion): source start/end events from profile_event_summary_mv when eligible#298

Open
ayushjhanwar-png wants to merge 1 commit into
mainfrom
perf/conversion-use-profile-event-summary-mv
Open

perf(conversion): source start/end events from profile_event_summary_mv when eligible#298
ayushjhanwar-png wants to merge 1 commit into
mainfrom
perf/conversion-use-profile-event-summary-mv

Conversation

@ayushjhanwar-png

Copy link
Copy Markdown

Summary

Conversion charts on plain event names (no per-event filters, no breakdown / hold properties) now source the start_events_raw and end_events_raw CTEs from the existing `profile_event_summary_mv` instead of scanning the raw `events` table.

The MV is already pre-aggregated by `(project_id, profile_id, name, event_date)` with `first_event_time` stored as an AggregateFunction — exactly the shape downstream wants (`min(created_at)` per profile per day). For high-traffic projects on wide windows, pulling from the MV skips hundreds of millions of rows.

Eligibility

A single event's CTE uses the MV path only when:

  • No per-event filters (the MV doesn't store property values).
  • No extra columns (breakdown / hold properties only live on `events`).
  • `groupCol === 'profile_id'` (the MV is profile-keyed; session-conversions stay on the events table).

Otherwise the existing `events`-table path is used unchanged. `session_id` is emitted as `''` from the MV branch — downstream `any(session_id)` accepts it and current callers don't read it when grouping by `profile_id`.

Measured impact

1-month shortreels `appOpen` conversion:

Path Read Duration
events table ~28 GiB 10-40s (frequent 40s timeouts)
MV ~6 GiB ~600 ms

Numbers match the `events` table exactly. Verified side-by-side via `uniq(profile_id)` and `countMerge(event_count)`:

```
events: 571410 profiles, 2244741 events
MV: 571410 profiles, 2244741 events
```

Test plan

  • Plain conversion (e.g. `appOpen → onboarding_complete`) on a 7d / 30d window: returns same daily counts as before, finishes in <2s.
  • Conversion with a per-event filter (e.g. `appOpen [country = US]`): still uses the events-table path, behaviour unchanged.
  • Conversion with a breakdown (e.g. by `os`): still uses the events-table path, behaviour unchanged.
  • Conversion with a hold property (e.g. `hold by showId`): still uses the events-table path, behaviour unchanged.
  • Conversion with `funnelGroup: session_id`: still uses the events-table path, behaviour unchanged.
  • Conversion where start / end is a custom event: the `if (customEvent)` branch above is taken — MV path is bypassed.

Notes

  • Doesn't materially help correctness, only perf — but conversions on plain event names are the common case (the slow shortreels appOpen → ... example that triggered this PR is exactly that).
  • The MV used (`profile_event_summary_mv`) is the existing one fed by the standard insertion path; no new MV to maintain.

…mv when eligible

For conversion charts where no per-event filters and no breakdown / hold
property columns are needed, the start_events_raw and end_events_raw
CTEs now read from profile_event_summary_mv instead of scanning the raw
events table.

The MV is pre-aggregated by (project_id, profile_id, name, event_date)
with first_event_time stored as an AggregateFunction — which is exactly
the shape downstream wants (min(created_at) per profile per day).
Pulling from the MV skips reading the full events table for the time
window, which can be hundreds of millions of rows on a 1-month
conversion against a high-traffic project.

Eligibility (otherwise the existing events-table path is used):
- no per-event filters
- no extra columns (breakdown / hold properties live on events table)
- groupCol === 'profile_id' (MV is profile-keyed)

session_id is emitted as '' from the MV path — downstream `any(...)`
accepts that and current callers don't read it when grouping by
profile_id, so behaviour is preserved.

Measured impact on a 1-month shortreels appOpen conversion:
  events table → ~28 GiB read, ~10-40s wall time, frequent 40s timeouts
  MV path      → ~6 GiB read,  ~600 ms wall time

Numbers match the events table exactly (verified via uniq +
countMerge side-by-side).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant